Skip to main content

Load Balancing

When NGINX acts as a reverse proxy, it can distribute incoming requests across multiple backend servers to achieve:

  • High availability
  • Better performance
  • Scalability
  • Fault tolerance

This is done using an upstream block.

Basic Upstream Configuration

upstream app_backend {
server 10.0.0.11:8080;
server 10.0.0.12:8080;
server 10.0.0.13:8080;
}

By default, NGINX uses round-robin.

Round-Robin Load Balancing (Default)

Requests are distributed sequentially across backend servers:

Request 1 → Server A
Request 2 → Server B
Request 3 → Server C
Request 4 → Server A

Each request goes to the next server in order.

Example Configuration

upstream app_backend {
server 10.0.0.11:8080;
server 10.0.0.12:8080;
}

server {
listen 80;

location / {
proxy_pass http://app_backend;
}
}
  • First request → 10.0.0.11
  • Second request → 10.0.0.12
  • Third request → 10.0.0.11
  • Even distribution over time

Weighted Round-Robin

upstream app_backend {
server 10.0.0.11 weight=3;
server 10.0.0.12 weight=1;
}
  • Server 10.0.0.11 receives 75% of traffic
  • Server 10.0.0.12 receives 25%

Pros

  • Simple
  • Efficient for similar backends
  • Default behavior

Cons

  • Does not consider current load
  • Not ideal for long-running requests

Least Connections (least_conn)

Each new request is sent to the backend with the fewest active connections.

Server A → 10 active connections
Server B → 3 active connections
→ New request goes to Server B

Example Configuration

upstream app_backend {
least_conn;

server 10.0.0.11:8080;
server 10.0.0.12:8080;
}
  • NGINX tracks active connections
  • New requests go to least busy server
  • Excellent for uneven or long-lived requests

Weighted Least Connections

upstream app_backend {
least_conn;

server 10.0.0.11 weight=2;
server 10.0.0.12 weight=1;
}

NGINX factors weight into decision-making.

Pros

  • Adapts to load
  • Ideal for slow APIs or streaming
  • Reduces overload

Cons

  • Slightly more overhead
  • Doesn’t track CPU or memory usage

IP Hash (ip_hash)

Client IP address is hashed to select a backend server.

Client IP → Hash → Server

Same client IP always maps to the same server (as long as it’s available).

upstream app_backend {
ip_hash;

server 10.0.0.11:8080;
server 10.0.0.12:8080;
}
  • Client 203.0.113.10 → Server A
  • Client 203.0.113.10 → Server A again
  • Enables session persistence (sticky sessions)

Use Case

  • Applications that store sessions in memory
  • Legacy systems without shared session storage

Limitations

  • Uneven distribution with NAT users
  • Not compatible with weights (fully)
  • Scaling changes may remap clients

Pros

  • Simple session persistence
  • No cookies required

Cons

  • Poor distribution with many clients behind NAT
  • Scaling issues

Choosing the Right Method

ScenarioBest Method
Identical backendsRound-robin
Long-running requestsLeast connections
In-memory sessionsIP hash
Modern appsLeast_conn + shared sessions

Real-World Production Example

upstream web_backend {
least_conn;

server 10.0.0.11:8080 max_fails=3 fail_timeout=30s;
server 10.0.0.12:8080 max_fails=3 fail_timeout=30s;
}

server {
listen 80;

location / {
proxy_pass http://web_backend;

proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
}
  • Least-loaded server gets traffic
  • Failed servers temporarily removed
  • Headers preserve client identity

Health & Failover Behavior

NGINX:

  • Marks server as failed after max_fails
  • Skips it during fail_timeout
  • Automatically retries healthy servers